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Abstract. We develop an algorithm to transform one given permutation into 
another given permutation. The algorithm is based on three basic DNA edit- 
ing operations suggested by a model for ciliate micro nuclear decryption. We 
define the distance between the two permutations to be the number of ciliate 
operations our algorithm performs during such a transformation. Combining 
well-known clustering methods with this distance function enables one to de- 
termine phylogenies. We apply these ideas to explore the relationships among 
the chromosomes of eight fruitfly (drosophila) species by using the well-known 
UPGMA algorithm on the distance function provided by our algorithm. 

Over evolutionary time "local" DNA editing events such as nucleotide substi- 
tutions, deletions or insertions diversify the set of DNA sequences present in or- 
ganisms. Results of whole genome sequencing suggest that also "global" DNA 
editing events affect the diversification of DNA sequences. In particular: Some 
pairs of species have the same genes positioned in different locations of their chro- 
mosomes. This phenomenon is observed in both somatic genomes and in mito- 
chondrial genomes in several branches in the tree of life. Figure Q] illustrates the 
phenomenon for 11 synteny blocks of orthologous genes in the X chromosome of 
human and mouse. 




Figure 1. The permutation between 11 synteny blocks of the hu- 
man and the mouse X chromosomes. The orientation of the syn- 
teny blocks have been ignored. See Figure 2 of [15] . 
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Since Dobshansky and Sturtevant's works j9] and [22] on fruitfly genomes in the 
1930's it has been popular to use inversions^ as the primary "global" DNA se- 
quence editing operation to describe phylogenetic relationships among the genomes 
of fruit flies. In recent years also transpositions and block interchanges have been 
considered as possible "global" DNA sequence editing operations - [8] , [4] , [15] , [23] . 

An insightful phylogenetic analysis that includes fine structural elements of re- 
versals is given in [6]. [6] addresses the question whether a reversal can occur at 
arbitrary locations in the genome of an organism. Certain locations, which would 
disrupt the coding region of an essential gene, would not be observed in extant or- 
ganisms. Similarly, locations that negatively affect the fitness of organisms would 
disappear over time due to "purifying selection". Additionally, certain sequence 
motifs may actually promote DNA recombination that results in a genome re- 
arrangement. For example [8] reports a correlation between breakpoints associated 
with rearrangments, and repetitive DNA. And in the review |13| a similar correla- 
tion between rearrangements in bacterial genomes and repetitive DNA is discussed. 
These considerations suggest that genome rearrangement events that lead to the di- 
verse genomes we observe in nature are not arbitrary, but constrained by contexts. 
In this paper we explore the use of context directed DNA recombination events to 
analyze genome rearrangements and to construct a phylogeny based on these. 

On p. 1661 of [6], in the discussion of selection of genes to which their analysis of 
rearrangements in fruitfly genomes apply, the authors indicate that genes deemed 
to have been relocated by a transposition rather than a reversal have been explicitly 
removed from the analysis. Thus, the analysis of |6] features reversals exclusively. 
On the other hand, the analysis of rearrangements in the genomes of two nematode 
species by [5] includes reversals, transpositions and translocations. In this paper 
we explore the use of reversals combined with block interchanges (both constrained 
by contexts) in the analysis of rearrangements. 

Experimental results from ciliate laboratories present us with examples of DNA 
editing operations that routinely occur during developmental processes in these 
organisms. The textbook [TU] and the two surveys Q15] and [30] give a good starting 
point for information about these "ciliate operations" and about the corresponding 
biological background. We shall call the yet to be fully identified system in ciliatcs 
that accomplishes micro nuclear decryption^, the ciliate decryptome. 

We shall illustrate how to use "ciliate operations" to deduce potential phyloge- 
netic relationships from genome rearrangement phenomena. Previous work, includ- 
ing [3], [5J, [12] . used unconstrained reversals to deduce phylogenetic relationships. 
Our main ideas are to use ciliate genomic elements to model two genomes related 
by permutations of locations and orientations of orthologous genes, to apply the 
context directed DNA operations of the ciliate decryptome to define a distance 
function between the relevant permuted genomes, and to then use a classical dis- 
tance based algorithm to derive phylogenies. Of the several different distance based 
algorithms available we selected the UPGMA algorithm^. We apply these ideas to 
the chromosomes of eight species of fruit flies (drosophila) to obtain a phylogeny 
for each of the chromosomes. We use the estimated divergence timeline of a known 
phylogenetic analysis to calibrate the metric on which our analyses are based. 



1 Sometimes inversions are called reversals. 

2 Some details regarding this process are given below in Section 1. 

3 Descriptions of UPGMA can be found in the online Chapter 27 of [B], or in the textbook [7|. 
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In addition to the feature of including context based constraints, the use of cili- 
ate operations as the basis for deriving a distance function has another attractive 
feature: The ciliate decryptome is programmable |17j . and the computational steps 
taken by the decryptome can be monitored under laboratory conditions [16J . Thus, 
there are extant organisms that are poised to be employed as DNA computing de- 
vices naturally equipped to determine phylogenetic relationships among permuted 
genomes. 

Our paper is organized as follows: In Section 1 we briefly describe ciliate nu- 
clear duality. This duality is the basis for modeling pairs of genomes related by 
permutation as genetic elements of the ciliate genome. Then in Section 2 we briefly 
describe the context directed DNA operations of the ciliate decryptome, after which 
in Section 3 we abstract the mathematical notion of a pointer list. Section 3 con- 
tains a mathematical analysis of properties of pointer lists. In Section 4 we model 
relevant features of the ciliate decryptome's DNA operations by mathematical op- 
erations on pointer lists. This section analyzes the mathematical properties of the 
introduced operations on pointer lists. The main conclusion of Sections 3 and 4 
are that the operations, applied repeatedly to pointer lists converge to fixed points 
that terminate the process. Lemma H of Section 3 guarantees that pointer lists of 
length larger than 4 are transformed by the operations of Section 4, the content 
of Theorem [5] of Section 4. Theorem [7] of Section 4 establishes that the notion of 
a pointer list is preserved by the three transforming operations. In Section 5 we 
describe an algorithm which we call the HNS algorithm, that uses these operations 
on pointer lists to compute the distance between chomosomes that are related by 
permutation. The work in Sections 3 and 4 proves that this algorithm halts. We 
also indicate that the algorithm runs in time polynomial in the length of a pointer 
list. In Section 6 we use data downloaded from flybase.org and the HNS- and UP- 
GMA algorithms to construct phylogenies over eight species for each of the fruitfly 
chromosomes. In the closing Section 7 we discuss possible future directions related 
to this work. 

1. ClLIATES AND NUCLEAR DUALITY. 

A ciliate is a single cell eukaryote that hosts two types of nuclei: one type, 
the macro nucleus, contains the transcriptionally active somatic genome, while 
the other type, the micro nucleus, contains a transcriptionally silent germline-like 
genome. The micro nuclear genome is, in the technical sense of the word, an 
encrypted version of the macro nuclear genome. Special events in the ciliate life 
cycle predictably trigger conjugation between a pair of mating-compatible cells. 
Conjugation results in what amounts to a Diffie-Hellman exchange^ between two 
conjugants, the formation of a new micro nucleus in each, and the decryption of one 
of more copies of the new micro nuclear genome to establish a replacement macro 
nuclear genome, while in each conjugant the instances of its pre-existing genome 
are discarded. Readers interested in a thorough survey of ciliate nuclear duality 
could consult [T§] . 

The relationship between micro and macro nuclear DNA 



4 A Diffie-Hellman exchange is a cryptographic protocol for secure exchange of a secret key in 
a hostile environment. The conjugants exchange a haploid copy of the germline genome, which is 
an encrypted version of the somatic genome. 
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To describe the experimentally observed relationship between the micro nuclear 
and macro nuclear DNA molecules, consider the following figure: 
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Figure 2. The top diagram depicts a possible micro nuclear pre- 
cursor, and the bottom diagram is another possible micro nuclear 
precursor of the macro nuclear gene in the middle diagram. 

The micro nuclear DNA sequences in the top and the bottom rows of Figure [2] 
each has three types of regions: The yellow blocks, labeled with letters, are called 
internal eliminated sequences (IESs). The blocks labeled with numbers are called 
macro nuclear destined sequences (MDSs), while the patterned segments are called 
pointers. The corresponding macro nuclear sequence in the middle row of Figure 
[5] contains only some of the pointers present in its micro nuclear precursor, and 
all the MDSs, but none of the IESs of the micro nuclear precursor. In the macro 
nuclear sequence these components are assembled in a specific order, which we call 
the canonical order. Examination of the micro nuclear precursors reveals that there 
are two copies of each pointer: For example MDS 2 has a pointer on the left flank 
that is identical to the pointer on the right flank of MDS 1. This pointer will be 
called the "1-2 pointer" . And MDS 2 has a pointer on its right flank which is 
identical to the pointer on the left flank of MDS 3. This pointer is called the "2-3 
pointer" . The other pointers are named similarly. Also note that MDS 1 does not 
have a pointer on its left flank, and MDS 5 does not have a pointer on its right flank. 
As the bottom row of Figure[2]depicts, in the micro nuclear precursor an MDS plus 
its flanking pointer(s), as a unit, can be in a 180-degree rotated orientation of the 
corresponding components in the macro nuclear gene. 

In "shorthand" the micro nuclear precursor in the top row of Figure [2] is 

(1) [4, 2, 1, 3, 5] 

while the micro nuclear precursor in the bottom row of Figure [5] is 

(2) [4, 2, 1, -3, 5]. 

2. The ciliate DNA operations 

We now turn to the actual ciliate algorithm that processes micro nuclear pre- 
cursors to produce their corresponding macro nuclear versions. We do not examine 
the biochemical foundations for this algorithm here. The journal articles [I] and 
[2Tj propose hypotheses about biochemical processes that perform the needed DNA 
editing events taking place during this decryption process. 
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The textbook [10] describes three DNA editing operations that underlie this 
decryption process. There is some experimental evidence for the hypothesis that 
these three operations are at work during the decryption process. For example: The 
journal article 16 gives for specific examples some experimental data about the 
DNA products of intermediate steps of the ciliate algorithm. We shall assume as 
a given that the three operations that produce macro nuclear molecules from their 
micro nuclear precursors are as proposed in [10) : context directed block interchanges 
(swaps), context directed reversals and context directed excisions. 

Context directed block interchanges (swaps): In Figure [3] the symbols p and 
q denote pointers. This pointer context for X and Z permits swapping X and Z 
while preserving their internal orientations if, and only if, in the configuration in 
the top line of Figure [3] no instance of pointer p or of pointer q is flanked by two 
MDSs. 



I 



I 



Figure 3. Context Directed Block Swaps: The p...q...p...q pointer 
context permits swapping the DNA segments X and Y. 



Context directed reversal: In Figure [4] one instance of pointer p is the Watson- 
Crick dual of the other instance. The segment "A" flanked by these two instances 
may be reversed (replaced by its Watson-Crick dual) unless an instance of p is 
flanked on both sides by MDSs. 



Figure 4. Context Directed Reversal: The -p...p or p...-p pointer 
context permits rotating the segment A flanked by them through 
180 degrees. 



Context directed excision: In Figure [5] the pointer p flanks an IES between two 
consecutive numbered MDSs: 



Figure 5. Context Directed Excision: The IES flanked by pointer 
p on both sides is removed with one copy of p. 
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Observe that context directed block interchanges and context directed reversals 
do not decrease or increase the length of the string they operate on, and they pre- 
serve the pointer contexts. But context directed excision, as illustrated in Figure [SJ 
changes the pointer contexts by deleting selected pointers and IESs. This operation 
has the effect of consolidating separated blocks of macro nuclear destined DNA into 
single blocks, and amounts to a "clean-up" and "termination" operation. 

3. Pointer lists 

Pointers are an essential ingredient of the three DNA editing operations. We 
exploit this central role of pointers by now basing our computational formalism 
(that mathematically models these three ciliate operations) on pointers. Towards 
this end we introduce the notion of a pointer list: 

Definition 1. A finite sequence P := [xi, ■■ ■ , x m ] of integers is said to be a pointer 
list if it satisfies the following six conditions: 

(1) m is an even positive integer; 

(2) there is a unique i with fi = \xi\ = min{|xj| : 1 < j < m}; 

(3) there is a unique j with A = \xj\ — max{|iEi| : 1 < i < m}; 

(4) For each i G {1, • • • , m} with fi < \xi\ < X, there is a unique j € {1, • • • , n}\ 
{i} such that \x{\ = \xj\; 

(5) for each odd i G {1, • • • , n}, Xi < Xi + i and Xi ■ Xi + i > 0; 

(6) whenever i G {1, • • • , n} is odd, there is no j such that \xi\ < \xj\ < |afi+i| 
or \x i+ i \ < \xj\ < \xi\. 

Stipulation (6) will also be called the exclusion property below. In our discussion 
below, the item Xi with minimal absolute value will be called "the minimal element" 
of the pointer list, while the item Xj with largest absolute value will be called "the 
maximal element" of the pointer list. We shall use the symbol E to denote the entry 
with smallest absolute value, and E' to denote the entry with largest absolute value. 

We will also adopt the following terminology for expository ease: 

Definition 2. Two items Xi and Xj in a pointer list are 

(1) a pair if \i — j\ = 1, and min{i, j} is odd. 

(2) mates if \xi\ — \xj\. 

Lemma 1. If Xi and Xj are items in a pointer list and i and j are distinct but have 
the same parity, then Xi ^ Xj . 

Proof. Let i and j be distinct elements of {1, 2, ••• but have the same parity. 

Towards deriving a contradiction, assume that contrary to the claim xi = Xj . 

Both i and j are odd. Then by stipulations (4) and (5) we find that Xi < Xi+i and 
Xj < Xj+i, and all these items are of the same sign. But then stipulation (6) implies 
that Xi+i = 3/j+i. Let A be Xi(— Xj) and let B be Xi+\{— Xj+i). Then our pointer 
list is of the form 

[••• , A, B, ■■■ , A, B, ■■■] 
and no pointer has absolute value strictly between the absolute values of A or B. 
Since there is a unique pointer of minimal absolute value and there are two pointers 
of value A, A is not th pointer of minimal absolute value. It follows that an odd 
number of pointers have their absolute values below that of A (the minimal element 
and then pairs of mates). Similarly the unique pointer of maximal absolute value is 
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not B, and there is an odd number of pointers with absolute values exceeding that 
of B. Since these pointers also all occur in pairs, for some pair Xk, Xk+i one pointer 
has absolute value smaller than that of A, while the other has absolute value larger 
than that of B, constituting a violation of stipulation (6). 

Both i and j are even. Then by stipulations (4) and (5) we find that < Xi and 
Xj-i < Xj, and all these items are of the same sign. But then stipulation (6) implies 
that £Ej_i = Xj—i. Let A be Xj_x(= ^j-i) and let B be x t (= Xj). As before our 
pointer list is of the form 

[••• , A, B, ■■• , A, B, ■■■] 

and no pointer has absolute value strictly between the absolute values of A or B. 
By the same argument as in the case when i and j were odd, we now derive a 
violation of stipulation (6). □ 

Lemma 2. For all distinct i and j for which \xi\ = \xj\, the following are equiva- 
lent: 

(1) Xi and Xj have the same sign. 

(2) i and j have distinct parity. 

Proof. (1) implies (2): Assuming xi — Xj, the contrapositive of the implication in 
Lemma [1] gives that i and j have distinct parity. 



(2) implies (1): We are now assuming that \xi\ — \xj\, and that i and j have 
distinct parity. Suppose that, contrary to (1), x% and Xj are of opposite sign. Wc 
may assume, without loss of generality that i is odd (and thus j is even). By 
stipulations (4) and (5) we have that Xi < £1+1 and Xj-i < Xj. 

Case 1: xi is positive. Then we have \xi\ < |xj+i|, and \xj\ < As \xi\ — \xj\, 

stipulation (6) implies that |sci+i| = Letting A denote |xj|(= \xj\)> an d 

letting B denote |a;i+i|(= | x j — 1 1 ) , wc find that as in the proof of Lemma[T]we have 
a pointer list of the form 

[•■• , A, B, ■■■ , -B, -A, ■■■] 

where no pointer has absolute value strictly between A and B. As before an odd 
number of pointers have absolute value less than A, and an odd number have 
absolute value larger than B, and some pointer pair includes one from each of these 
two categories, constituting a violation of stipulation (6). 

Case 2: Xi is negative. Similar considerations show a violation of stipulation (6). □ 

Lemma 3. There are no pointer lists of the form [B, C, ■ ■ • , C, B\. 

Proof. Note that by stipulation (1) of the pointer list definition the parity of the 
leftmost instance of B is odd, while the parity of the rightmost instance of B is even. 
The reverse applies to the two instances of C. By stipulation (5) of the pointer list 
definition we would have B < C and C < B, whence there are four copies of B in 
the pointer list, violating stipulation (4) for pointer lists. □ 

Lemma 4. If [x\, x%, ■ ■ ■ , x m -\, x m ] is a pointer list of length larger than 4, then 
at least one of the following three statements is false: 

(a) (Vi)(xi ^ x i+ i) 
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(b) (Vi)(Vj)(// \xi\ = \xj\, then X{ = Xj) 

(c) (Vi)(Vj)(v7c)(W)(// i =/= k, j =/= i, i < j and Xi — Xk and Xj — xe, then either i < 
j < I < k or i < k < j < £) 

Proof. Assume that contrary to the claim of the lemma, there exists a pointer list 
of length larger than 4 which also satisfies properties (a) , (b) and (c) . 

By (b) any two pointer entries that are mates are of the same sign. Thus by 
Lemma [2] the positions in which mates occur are of opposite parity. By (a), no 
mates form a pair. By (c), the only relative configurations possible between two 
sets of mates are 

(3) [...A---A---B---B---] 

and 

(4) [• • • A ■ ■ ■ B ■ ■ ■ B ■ ■ ■ A ■ ■ ■ ] 

Sublemma A: Configurations of the form [■ ■ ■ A, ■ ■ ■ A, ■ ■ ■ B, ■ ■ ■ B, ■ ■ ■ C, ■ ■ ■ C, ■ ■ ■] 
are impossible: 

Proof of Sublemma A: For by (a) there are xa, xb and xc with xa between the 
two copies of A, xb between the two copies of B and xc between the two copies of 
C. We may assume that we have selected the A, B and C for which the number of 
items between consecutive copies of the same symbol is minimal in each case. Thus, 
there is no D such that both copies of D are between the two copies of A, or of B 
or of C. But at least one of xa, xb or xc differs from the minimal and from the 
maximal element of the pointer list, and thus has a partner symbol (a mate) (of the 
same sign, by (b)) located in the pointer list. Assume it is xa (the argument is the 
same for the other cases): Then the other copy of xa does not occur between the 
two copies of A. But then the two copies of A and the two copies of x a constitute 
a violation of (c). This completes the proof of Sublemma A.D 

It follows that if we have a pointer list satisfying (a), (b) and (c), then for all 
distinct triples A, B and C that are not the minimal or maximal elements of the 
pointer list we have configurations of only the following two general forms: 

(5) ....i.... />-.... r ....f.... /;.... or 

(6) l. •• -r. ..-r. l.. ••/.'.-•• . //.••• . 

Sublemma B: No pointer list of length larger than 4 is of any of the forms 

(i) [x,---, y], 
(h) [x, y, ■••], 

(iii) [••• , x, y], 

where {|x|, |y|} = {E, E'}. 

Proof of Sublemma B: For suppose some pointer list is of one of these forms. 
Choose a pair of mates, say of value A, with the fewest possible pointers between 
them. Thus, we have a pointer list of one of these three forms, which contains a 
pattern [• ■ ■ , A, ■ ■ ■ , A, ■ ■ ■] and the number of pointers between the two copies of 
A is as small as possible. Suppose B is a pointer appearing between these two A's. 
Since B is not E or E' we have by property (c) two copies of B appearing between 
these two A's, contradicting the minimality condition on the number of pointers 
between two mates. This concludes the proof of Sublemma B. 
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Sublemma C: If there is a pointer list of form [A, ■ ■ • , B] where A and B each 
has mates, then it is of the form [A, ■ ■ ■ , A, B, ■ ■ ■ , B\. 

Proof of Sublemma C: The mate of the initial A must be in a position i which 
is even, and the mate of the terminal B must be in a position j which is odd. By 
property (c) we have i < j. Because i is even and j is odd, if j is not i + 1, then 
there are a positive even number of pointers between the second copy of A, and the 
first copy of B. These pointers cannot have absolute value E or E', for otherwise 
there would be between the two copies of A, or else between the two copies of B, a 
configuration of the form • • • , C, C, ■ ■ ■ , which is forbidden by (a) . But then there 
is some other pointer C between the A in position i and the B in position j. By (c) 
we must have the mate of C also between the A in position i and the B in position j, 
meaning the pointer list is of the form [A, ■ ■ ■ , A, ■ ■ ■ , C, ■ ■ • , C, • ■ ■ , B, ■ • • , B], 
and this contradicts Sublemma A. This concludes the proof of Sublemma C. 

Sublemma D Configurations as in ([5]) are impossible. 

Proof of Sublemma D: By Lemma [3] only the following are possibilities for ([5]): 

(a) [E, B, C, ■■■ , A, E', A, ■■■ , C, B] 

(b) [B, G, ■ ■ ■ , A, E', A, ■■■ , C, B, E] 

(c) [B, E,C, ■■■ , A, E', A, ■■■ ,C, B] 
(d) [B, C, ■ ■ ■ , A, E',A,--- , D, C, E, B] 

But then the parity of the positions of the two copies of B in (a) and in (b) are 
the same, while the parity of the positions of C in (c) and (d) are the same. This 
contradicts Lemma [2] This completes the proof of Sublemma D. □ 

Sublemma E Configurations as in © are impossible. 

Proof of Sublemma E Consider configuration ((6|: 

[. . . , A, . . . , C, . . . , C, . . . A, . . . , B, . . . , B, . . .]. 

Neither \xi\, nor \x n \ can be a member of {E, E'}, since this will allow a configu- 
ration of the form ... D, D, ... occurring between the two copies of A or the two 
copies of B, contradicting (a). By Sublemma C configurations as in ([6]) must be 
of the form [A, ■ ■ ■ , C, ■ ■ ■ , C, ■ ■ ■ , A, B, ■ ■ ■ , B\. To avoid a contradiction with 
premise (a), this configuration must be of the form 

[A, ■ ■ ■ , C, • • • , Xi, ■ ■ ■ C, ■ ■ ■ , A, B, ■ ■ ■ , Xj, ■ ■ ■ , B\. 

where {\xi\, \xj\} — {E, E'}. Applying premise (a) again we see that for each 
pointer D between positions 1 and i, its mate is in the corresponding position 
between positions i and the position of the mate of A. The same remark applies to 
the segment between the two S's of the pointerlist. Thus the pointer list is of the 
form 

[A\, A 2 , ■ • • , Ak, Xi, Ak, ■ ■ ■ , A 2l A±, B\, B 2 , ■ ■ ■ , B t , Xj, B t , ■ ■ ■ , B 2 , Bi] 

But then both copies of Ai are in odd positions, contradicting Lemma [TJ This 
completes the proof of Sublemma E, and thus of Lemma 2] □ 
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Examples of pointer lists. 

Let Z denote the set of integers, for an integer z we define 



2(1) 



if 2= \z\ 
1 otherwise 



and in all cases z(2) = z(\) + 1. 

For a set S the symbol <U1 S denotes the set of finite sequences with entries from 
S. Define the function tt : <"Z -> <"Z by: 

tt([zi,-- - ,z fc ]) = [zi(l),ii(2), • • • ,z fe (l),z fe (2)] 

Thus, for example, 7r([— 1,4, 3, 5, 2, —9, 7, 10, —8, 6]) is the sequence 

[-2, -1, 4, 5, 3, 4, 5, 6, 2, 3, -10, -9, 7, 8, 10, 11, -9, -8, 6, 7]. 

Lemma 5. For each finite sequence M := [si, s 2 , ■ ■ ■ , s n ] of non-zero integers 
such that there is an integer m for which {|sj| : 1 < i < n} = {m +1, • • • , m + n}, 
the sequence n(M) is a pointer list. 

Proof. The sequence [|si|, |s 2 |, • • • , |s„|] is a permutation of the numbers m + 1 
through m + n. Note that m > 0. 

From the definition of 7r we have for each j that 



,-(l).«i(2) 



m+i, m+i+1 if Sj = m+i 
-(m+i)-l,-(m+i) if Sj = - (m+i) 



Thus for each odd indexed entry Xj in ir(M), the absolute values \xj\ and \xj + i\ 
are successive positive integers, meaning that stipulation (6) in the definition of a 
pointer list is satisfied by tt(M). It is also evident that stipulation (1) is satisfied. 
Also note that the smallest absolute value obtained by terms of n(M) is m + 1, 
and this is achieved by exactly one entry of ir(M). Similarly, the largest absolute 
value achieved is m + n + 1, and is achieved by exactly one term in tt(M). Thus 
stipulations (2) and (3) are satisfied. Towards stipulation (4), consider an entry Xi 
of 7r(M) which is not of least or largest absolute value. Note that then we have 
to+2 < m+t = \xi\ < m+n. Choose j such that xi = Sj(l), or Xi — Sj{2). Find the 
k for which \sk\ = m+t— I, and also find the i for which \se\ = m+t+1. Then \xi\ is 
equal to exactly one of Sfe(l), (2), s^(l) or si{2). Thus, stipulation (4) is satisfied. 
To see stipulation (5), observe that for any odd i, {xt, x i+ i} = {Sj(l), Sj(2)}, and 
thus these two entries have the same sign. □ 



4. The ciliate operations on pointer lists 

We now introduce three special functions, cde, cds and cdr, from <LO Z to <W Z, 
inspired by the three ciliate operations, as follows: For a given finite sequence 

P • [^1 j * * ' j ^m] j 

Context Directed Excision : 

. . _ J P if there is no i with X{ = Xi+i 

\ [*^ii ' ' ' ? i j 7 ' ' ' 7 %m\ for i mimimal with Xi = otherwise. 
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Context Directed Reversal: 



cdr(P) 



P if there are no i < j with 

Xi — Xj 

[x%, ■ ■ • ,Xi-i,Xi, —Xj, • • • , —Xi + i, Xj+i, • ■ • ,x m ] for i mimimal with Xi — —Xj, 

for a j > i otherwise. 



Context Directed Block Swaps: 

cds(P) — P if there are no i < j < k < £ with x% = Xk and Xj — xg. However if 
there are i < j < k < £ with Xi = Xk and Xj = xg, then choose the least such i, and 
define cds(P) to be 

\x\ : j Xi , Xk ?***; X£ , Xj j * * " > Xk— 1 , j 7 — 1 7 *^+l j * * * 5 ^jn] 

These three operations behave rather well on the subset PL = {a G <W Z : 
er is a pointer list} of their domain. 

Theorem 6. If P is a pointer list of length larger than 4, then at least one of the 
following statements is true: 

(1) cde(P) ^ P; 

(2) cdr(P) ± P; 

(3) cds(P) ^ P. 

Proof. Let P — [x\, X2, ■ • • , x m —i, x m ] be a pointer list. By Lemma[5]at least one 
of the following three statements is false: 

(a) (Vi)(xi ^ x i+ i) 

(b) (Vi)(Vj)(If \x { \ = \xj\, then x l = Xj) 

(c) (Vi)(Vj)(V/c)(W)(If i =/= k, j =/= £, i < j and = and Xj = o^, then either i < 
j<£<k or i<k<j<£) 

If statement (a) fails, then for some i we have xi = Xi + \. We may assume this i 
is the minimal such. Then cde(P) = [x±, ■ ■ • , Xj_i, 2^+2, ■ ■ • , x m ] ^ P. 

If statement (b) fails, then fix an i and j with \xi\ = \xj\, but xi ■ Xj < 0. Then 
cdr(P) — [xi , , Xi , i£j , Xj—i , • , ^Ei+i , ^j+i , ■ • ■ , x 7n \ ^ P. 

If neither (a) nor (b) fails, then it must be that (c) fails. As (c) fails, fix i, 
j, k and I witnessing this failure. We may assume that i < j and Xi = Xk, 
and Xj = xe, and that i ^ k and j ^ £. Observe that by stipulation (4) in the 
definition of a pointer list we have Xi ^ Xj, and thus k ^ £. Since (c) fails, the two 
configurations claimed by (c) are false for our witness, we have that i < j < k < £, 
or i < £ < k < j, or £ < i < j < k. In each case an application of cds results in a 
sequence cds(P) 7^ P. □ 

Theorem 7 (Pointer list preservation). Let P — [x±,--- ,x m ] be a pointer list. 
Then each of cde(P), cdr(P) and cds(P) is a pointer list. 

Proof. We first verify that the equivalence of Lemma [2] is preserved by an applica- 
tion of cde, cdr or cds to a pointer list. 

Operation cde: We need to consider only the case when cde(P) ^ P. Fix the 
smallest i such that Xi = Xi + \. Then we have 

cde(P) = [xi, • • • , Xi-i,x i+2 , ■ ■ ■ ,x m }. 

The parity of the position of each surviving term is the same as before, since the 
position number changed by or by 2. Since cde does not affect the signs of the 
terms in the original pointer list, the equivalence of Lemma[2]still holds for cde(P). 
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Operation cdr: Now assume that cde(P) = P and cdr(P) ^ P. Choose the least i 
such that for a j > i we have Xi = —Xj. By Lemma [21 i + j is even. Now 

cd r(P) [xi , , Xi , , Xj—i , • • • , x^-j-i , Xj-\. \ , • • • , x m ] , 

and either i and j are even, or else i and j are odd. Thus there are an even number 
of terms moved and signs changed in this application of cdr. Each of these terms 
also is moved to a position whose position number is of opposite parity of the 
original position number. Thus, the equivalence of Lemma [5] still holds of cdr(P). 

Operation cds: We may assume that cde(P) = P. Suppose that cds(P) / P, and 
choose the lexicographically least (i,j,k,£) such that i < j < k < £ and Xi = xj, 
and Xj — xg. By Lemma [2] i and k have opposite parity, and j and £ have opposite 
parity. Then cds(P) is equal to 

[Xl , , Xi— \ , Xj , Xfc , ' , X£ , Xj , Xj"-|_i , • , X&_1 , Xi-y \ , ■ • • , Xj — \ , X£_|_i , • ■ ■ , X m ] . 

Case 1: i is even and j is odd. By Lemma [2] we then also have that k is odd and 
£ is even. Thus, an even number of blue terms are swapped with an even number of 
red terms, and the parities of the positions of all terms remain the same. Since no 
signs are changed during an application of cds, the equivalence of Lemma [5] remains 
true of cds(P). 

Case 2: i is even and j is even. By Lemma [2] we have that k and £ are both odd. 
In this case an odd number of blue terms are swapped with an odd number of red 
terms, and no signs are changed. Since the terms's positions have the same parities 
as before, it follows the equivalence of Lemma [5] still holds of cds(P). 

The cases when i is odd and j even, or when i is odd and j is odd, use similar 
arguments. 

What remains to be proved is that the result of applying any of cde, cdr or cds 
to the pointer list P is again a pointer list. 

Since neither of cdr or cds changes the number of terms of the list, and since 
cde deletes exactly two terms or none, the result has an even number of terms. 
Thus stipulation (1) in the definition of a pointer list is preserved. Since the terms 
least and largest in absolute value are unique, and only cde removes consecutive 
terms that are equal, these two terms survive all applications of cde, cdr or cds. 
Thus stipulations (2) and (3) in the definition of pointer lists is preserved by these 
operations. 

Since only cde removes terms that are adjacent and equal, and since none of the 
operations cde, cdr or cds affects the absolute value of any term, also stipulation 
(4) in the definition of pointer lists is preserved by these operations. 

We must verify stipulations (5) and (6). 
Consider cde(P): Suppose i is minimal with xi = li+j.. 
Case 1: i is odd: 

Then cde(P) = [ X\, X2, Xj_i, x,j_|_2, ■••x„ l _i, x m ]. Stipulation (6) remains true 
since the removal of the two consecutive terms do not change the parity of the 
remaining indices, and thus does not affect the truth of stipulation (6) for the 
remaining terms. The same reason shows that stipulation (5) remains true for 
cde(P). 

Case 2: i is even: 

Now by stipulation (5) we see that Xi_i < Xi = Xi+i < x^+2, and these terms have 
the same sign. Upon applying cde, we have [xi, ■ ■ • , x%—i, Xi+2, ■•■ , x m ] and all 
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stipulations of the definition of pointer list are still satisfied. We verify stipulation 
(6): By the exclusion property there are no Xj with absolute value between the 
absolute values of Xi-\ and Xi, and no Xj with absolute value between the absolute 
values of iEj+i and Xi + 2- Thus upon the removal of Xi = iEi+i, there is no Xj with 
absolute value between the absolute values of Xi-i and lEj+a- It follows that cde(P) 
still satisfies stipulation (6). 

Consider cdr(P): Suppose i is minimal such that for a j > i we have Xi = —Xj. 
Application of cdr to P yields 

cd r(Pj \x \ , , xi , Xj , Xj~ \ j • , 2^-f-i ; Xj-\. \ , • • • , £771] . 

If i is even then by Lemma[5]j is even and so Xj—i < x 3 ■, implying that —Xj < —Xj—x, 
and by stipulation (6) there is no term from cdr(P) with absolute value between 
the absolute values of — Xj and —Xj—i. Since i is even we similarly have Xj-i < Xi 
and there are no terms in cdr(P) with absolute value between the absolute values 
of x%—i and Xi. Also, as i is even i + 1 is odd, and so — x%+i is in an even parity 
position, and there still are no terms of cdr(P) with absolute value between the 
absolute values of — Xi+2 and —Xi+\. 

If i is odd, then by Lemma [2] j is odd. Thus — Xi+\ < —Xi = Xj < Xj+i and by 
stipulation (6) there are no terms of P in absolute value between \xi\ and |xj+i|. 
Similarly there are none with absolute value in the interval \xA and |x,-_|_i|. But 
then, aside of X, — —Xj there are no terms of cdr(P) with absolute values between 
\x i+1 \ and |xj+i|. Since atipulation (6) for the other indices is not affected by cdr 
if follows that cdr(P) has still satisfies stipulation (6). Parity and sign arguments 
show that cdr(P) still meets stipulation (5) of the pointer list definition. 

Consider cds(P): Choose the lexicographically least k, £) such that i < j < k < 
I and X{ = Xk and Xj = xg. Then cds(P) is 

[xi,*** ) Xi — 1 , Xi : Xfc, ■•■ , Xi , Xj , Xj-\- \ , • , Xk — 1 , Xi^-i , ■ • ■ , Xj — i , , • • ■ , x m ] . 

To verify stipulations (5) and (6) for cds(P), given P satisfies stipulations (5) and 
(6), we argue as follows: 

Case 1: i is even and j is odd. By Lemma [2] k is odd and £ is even. Thus, an 
even number of blue terms are swapped with an even number of red terms, and 
the parities of the positions of all terms remain the same and no signs are changed 
during an application of cds. Since the parities are preserved, stipulation (5) is 
preserved. To see that stipulation (6) is preserved observe that no pairs of the form 
Xt, Xt+i with t of odd parity are disrupted by this instance of cds. 

Case 2: i is even and j is even. By Lemma [2] k and I are both odd. In this 
case an odd number of blue terms are swapped with an odd number of red terms, 
and no signs are changed. The only pair of the form xt, Xt+i with t odd that is 
disrupted is the case when t = I. Since £ is odd and j is even, j — 1 is odd and we 
have Xj—i < Xj — X£ < xi + ±. It follows that stipulation (5) is still true, and that 
stipulation (6) still holds of cds(P). □ 

A finite sequence a is a fixed point of a function F : <U} 7L -^, <u} Z if F(u) = a. 
It follows from the definitions that if P is a pointer list of length larger than 4 and 
not a fixed point of F £ {cdr, cds}, then P(P) is not a fixed point of cde. 
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5. The HNS algorithm 

Call a pointer list a destination if it is one of the following: [/x, A], [—A, — /x], and 
for integers z with \z\ £ {A, it}, [z, A,/x, z] and [z, — /x, — A, z]. Our results imply 
that if Po is a derived pointer list which is not a fixed point of any of the three 
ciliate operations, and if we define Pj+i as the result of applying cde repeatedly, 
starting with Pi, until a fixed point of cde is obtained, or else as cds(Pi) if Pi is not 
a fixed point of cds, or else as cdr(Pi) if Pi is a fixed point of both cde and cds,then 
the sequence 

Po, Pi, • • • j Pi, ■ ■ ■ 

terminates in a destination. 

Thus the following algorithm, which we call the HNS algorithm, halts: 

1) Input: A pointer list P; 

2) While cde(P)^ P, apply cde. Else, proceed to 3) 

3) While the result is not of length at most 4, proceed to 4). Else, 
terminate the algorithm. 

4) If P is not a fixed point of cds, apply cds, and then return to 2) . 
Else, proceed to 5). 

5) If P is not a fixed point of cdr, apply cdr and then return to 2) . 

Suppose the original length of the pointer list is 2n. 
In step 2, the algorithm examines 2n-l pairs. The result is either of length at most 
2n-2, or it is of length 2n. 

In step 3, compute the length of the output of step 1). This takes at most 2n steps. 
In step 4 the algorithm starts with a position % < 2n, and then chooses a position 
k > i + 1 with Xi — Xk if any. 

This takes at most (2n — 1) + (2n — 2) + • • • + 2 search steps. If this fails, proceed 
to step 4. 

Else, suppose a successful z + l<fc<2*nis found. Then for i < j < k search 
for an £ > k with xj = xg. This would require at most (k — i) * (2n — k) steps. 
If this fails, proceed to step 4. Else, execute a cds based on the found quadruple 

k,£), and return to step 2. This step is completed within 2 * (2n) 2 steps. 
In step 5 the algorithm starts with a position i < 2n and then scans positions 
j > i until it finds an xj = —x^. The worst case scenario for this search is also 
(2n — 1) + (2n — 2) + • • • + 2. If the search fails, return to step 2. Else, the search 
succeeds, and in at most 2n-l steps the result of cdr is obtained. Now return to 
step 2. 

In one cycle of executing steps until return to step 2, the worst case scenario 
employs at most 2n + 2 * (2n) 2 + (2n) 2 + 2n — 1 = 3 * (2n) 2 + 2 * (2n) — 1 search and 
execution steps. For the next round the upper bound is3*a: 2 + 2*a; — 1 evaluated 
at 2n — 2. This continues for at most n rounds. Thus a global upper bound, in 
terms of the length of the initial pointer list, is 0(n 3 ). 

The reader will note that efficiency of this algorithm that produces from an 
initial pointer list a fixed point for the operations cde, cds and cdr in 0(n 3 ) steps 
can probably be much improved. Additionally, this algorithm most likely does not 
minimize the number of steps taken, using cde, cds and cdr, to reduce a pointer list 
to a fixed point. 

In our phylogenetic application below, any calibration of time span in terms 
of the number of operations required is based on the above HNS algorithm as 
computational standard for the calibration. 
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6. AN APPLICATION TO GENOME PHYLOGENETICS. 

As illustrated in Figure [TJ for organisms A and B there may be a set of genes 
for which both A and B have orthologs on the a corresponding segment of their 
chromosomes. Choose A as reference and number the orthologs of these genes in 
their 5' to 3' order of appearance on A's chromosome as 1, 2, 3, • • • , n. In species 
B the orthologs of these same genes may appear in a different order, and individual 
orthologs may also appear in orientation opposite from the orientation in A. Write 
the corresponding list of numbers in their order of appearance on B's chromosome, 
making the number negative if the gene orientation is opposite to that in A. The 
result is a signed permutation of the list 1, 2, 3, • • ■ , n. 

Now imagine that the list of genes for A are the MDS's of a ciliate macro nu- 
clear gene, while the corresponding signed permutation is that gene's micro nuclear 
precursor. Take the number of operations the ciliate decryptome performs to con- 
vert the micro nuclear precursor to its macro nuclear version as a measure of the 
evolutionary distance between the two genome segments of A and B. We used the 
HNS algorithm to simulate the actions of the ciliate decryptome on the set of highly 
permuted genomes from various species of fruit flics. 

Methods: 

The fruitfly genome is organized in six chromosomes called "Muller elements" 
and named A, B, C, D, E, F. Using data obtained from flybase.org we examined the 
permutation structure of these for the eight species D. melanog aster, D. yakuba, D. 
erecta, D. sechellia, D. mojavensis, D. simulans, D.grimshawi and D. virilis. 

As illustrated in Figure 3 of [6] , reproduced below in Appendix III as Figure [20j 
there is a transposition of genes between Muller elements B and C for D. erecta, 
one of the species in our sample. Thus we combined Muller elements B and C into 
one computational unit for our application. Thus, we refer to the five units A, B/C, 
D, E and F in the remainder of this discussion. 

For all species we considered their orthologous genes appeared on the same of 
these five units. For each of the units we computed, using in-house developed 
software written in Python, the number of applications of context directed swaps 
or context directed reversals performed by the HNS algorithm to permute the gene 
order of one species to produce the corresponding gene order of another species. 
This was done with each species considered as reference species. Since HNS gives 
preference to block interchanges the number of reversals in our derived data is 
low. Appendix I contains these results for each of the five units in Figures [7j [9j 
QTJ Q]|] and [15j An entry in the format "r:s" in row i and column j of a table is 
interpreted as follows: "r" denotes the number of context directed reversals (cdr 
operations), while "s" denotes the number of context directed block interchanges 
(cds operations) executed by the HNS algorithm to convert the permutation of the 
species in row i to that of the species in column j. Thus the species in column j is 
the reference species. The total for whole genomes is given in Figure [T71 

From this data about the number of context directed swaps and reversals we 
define a corresponding distance matrix by using the formula s + | . As the reader 
would observe from examining our data, this in fact does define a metri (0. The 



5 There are strong grounds for equating the value of two reversals with that of a single swap. 
As computations show, the result, Figure [8] is a matrix that is symmetric over its diagonal, and 
thus it defines a metric. 
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tables describing these metrics are given in Appendix I as Figures l8| ITO ] fT2l fl4l and 
1161 The metric for the whole genome is given in Figure 1181 

Then we applied the unweighted pair group method with arithmetic mean, also 
known as the UPGMA algorithnQ, to these metrics. We used an in- house de- 
veloped MAPLE implementation of UPGMA to compute these phylogenies. The 
corresponding phylogenetic trees are given in Figure [6] below. These trees were 
drawn using the "newicktree" package for the LaTeX typesetting system. 

We used Figure from the DroSpeGe web sit^], reproduced in Appendix 2 
below, to calibrate the time line in our phylogenetic trees. This calibration is a 
rough time line: Our work describes evolutionary relationships among instances of 
a specific chromosome present in these eight species. The evolutionary time line 
for a chromosome need not agree with the evolutionary time line for speciation. 
According to Figure [19] the time span from the earliest common ancestor of our 
species is roughly 40 million years. 

Discussion 

By using the UPGMA algorithm to construct phylogenies from distance matrices 
we assumed a uniform rate of evolution for the Muller elements. Comparing these 
uniform rates among the different chromosomes indicate that no two individual 
chromosomes undergo permutations at the same rates. Our results suggest the 
following permutation rates for these chromosomes, measured in number of ciliate 
operations (co) per million years (my): 



Computational unit 


Rate (in co/my) 


A 


6.5692 


B/C 


4.6044 


D 


4.8680 


E 


8.7625 


F 


0.1938 


Whole Genome 


22.4995 



These numbers, which can be taken as stability coefficients, indicate that the Muller 
F element has undergone remarkably few permutations in comparison with the 
other Muller elements. Muller element E appears to be the most susceptible to 
permutation. 

According to Figure [501 the F element of D. willistoni (which is not among 
the species we considered) has been absorbed in the E-element of D. willistoni. It 
would be interesting to "distill" the D. willistoni F-element from the D. willistoni E- 
element, and compare its level of permutation relative to the F-element of the eight 
species in our study. Assuming the stability coefficients above it may be possible to 
obtain from the current permutation state of the distilled "D. willistoni F-element" 
an estimate of how long ago absorption of the F-element into the E-element took 
place. 

Similarly, by separating the treatment of the B and C elements, and calcu- 
lating the corresponding stability coefficients of these elements, and distilling the 
B-element components and the C-element components for D. ananassae, one may 
be able to estimate the time since these transpositions occurred. Figure [20] also 

^This is algorithm 4.1 in 0. A good exposition is also given in Chapter 27 of [5], available 
online at www.evolution-textbook.org. 
^http://insects. eugenes.org/DroSpeGe/ 
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Figure 6. Phylogenies based on UPGMA and the ciliate operations 



indicates that part of D. pseudoobscura's Muller A element was transposed to its 
Muller E element. Stability coefficients may be useful in estimating how long ago 
this transposition occurred. An investigation of the structural properties of the 
chromosomes involved in these inter chromosomal translocations may also reveal if 
any DNA motifs promote these translocations. 
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The differences in the phylogenies for the different chromosomal domains in the 
considered species suggest the possibility of inferring, from Mcndclian inheritance 
hypotheses and diploidy of the fruitfiy genomes, inter breeding among ancestor 
species that would produce the observed chromosomal configurations. 

We relied on the UPGMA algorithm for constructing our phylogenies. Other 
clustering techniques such as Neighborhood joining, the NGP algorithm in [6], or 
several other algorithms as for example in [7], may reveal finer details than the 
technique applied here. 

There have been studies that take into account only reversals, for example [3J or 
[12] among many, or block interchanges or transpositions, as for example[8], |15) or 
[23] . In all these studies there were no constraints on the conditions under which 
a reversal or a block interchange may occur. Indeed, any given permutation can 
be transformed into another given permutation by applications of unconstrained 
reversals and block interchanges. The fine structural analysis of the permutations 
in the chromosomes of twelve fruitfiy species undertaken in [6] suggests that break- 
points in the observed permutations of fruitfiy genomes are not random (See the 
"Discussion" section of [6.), and this is apparently also the conjecture for observed 
permutations in genomes of some other insects and mammals. These observations 
suggest that modeling genome permutations ought to include constraints on the 
basic operations hypothesized to generate these permutations. 

One of our findings in using ciliate operations to compute the distances between 
pairs of species is the occurrence of permutations which are not reducible to each 
other by these ciliate operations. Thus, in contrast to the case for unrestricted block 
interchanges and unrestricted reversals, not all permutations are invertible by con- 
text directed block interchanges and reversals. When our algorithm terminates 
with a destination of length 4 instead of 2, this indicates that the two permutations 
involved in the distance measure requires an additional transposition to complete 
the transformation. Though we have not done so in our current paper, the fact of 
uninvertibility by ciliate decryptome operations could be taken as an additional pa- 
rameter in measuring evolutionary distance. Instead, in this paper we counted this 
additional transposition needed at the end as a single step towards the distance. An 
argument can be made that the necessity of this additional transposition should be 
accounted for more significantly in computing evolutionary distance. It also raises 
the question of determining an easily applicable characterization of permutations 
that are invertible by constrained block interchanges or reversals. This problem of 
mathematically characterizing permutations that are invertible by context directed 
operations has been taken up in subsequent work [2]. 

Finally, although the HNS algorithm finds in polynomial time the data needed 
to construct a distance matrix, we do not propose that this algorithm finds optimal 
data in the following sense: When one permutation can be transformed to another 
by means of context directed reversals and block interchanges, what is the least 
number of these operations needed for such a transformation? Some partial results 
on this question have been obtained in [2], but a complete answer is not known. 
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Appendix I: The distance matrices underlying the application of 

UPGMA TO THE FIVE CHROMOSOMES OF EIGHT FRUITFLY SPECIES. 
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Figure 7. A: 


r:s denotes number of cdr: number of cds 
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Figure 8. Distance matrix for Mullcr Element A 
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FIGURE 9. Muller Element B/C 
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Figure 10. Distance matrix for Muller Element B/C 
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Figure 11. Muller Element D 
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Figure 12. Distance matrix for Muller Element D 
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Figure 13. Muller Element E 
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Figure 14. Distance matrix for Muller Element E 
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Figure 15. Muller Element F 
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Figure 16. Distance Matrix for Muller Element F 
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Figure 17. Whole Genome totals 
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Figure 18. Distance Matrix for the Whole Genome 

Appendix II: The reference phylogeny used to calibrate the 
phylogenetic trees for the Muller elements. 




Figure 19. Drosophila-phylogeny, from 



http://insects.eugenes.org/DroSpeGe/ 
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Appendix III: Visual illustration of permuted Muller elements 

across species. 

Muller Element 

ABC D E F 



i i ii ii ii in 




FIGURE 20. Figure 3 of (6]: Note the B and C element transposi- 
tions for D. erecta, the absorption of D. willistoni's F-element into 
its E-element, and the transposition of a segment of D. pseudoob- 
scura's A element to its D-element. 



